-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Fix the benchmarks #1001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the benchmarks #1001
Conversation
protected virtual Job GetJobDefinition() | ||
=> Job.Default | ||
.WithWarmupCount(1) // ML.NET benchmarks are typically CPU-heavy benchmarks, 1 warmup is usually enough | ||
.WithMaxIterationCount(20); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamsitnik I think we should make the default configuration as the train config as most of the tests are running in the train configuration ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Anipik thank you for the suggestion! The TrainConfig
is a very specific config, I would prefer to not make it a default one
nuget.config
Outdated
<configuration> | ||
<packageSources> | ||
<!--To inherit the global NuGet package sources remove the <clear/> line below --> | ||
<clear /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eerhardt will this affect anything else, or will they all keep using the MSBuild source?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to not have this list duplicated. NuGet.configs are not great because they cannot be modified (either adding new sources or removing sources, or changing locations) without actually modifying the file.
Instead, would it be possible to:
- pull the
RestoreSources
property out ofDirectory.Build.props
and put it in a separate file, saybuild\NuGetSources.props
.Directory.Build.props
imports that file. - Change the Benchmark toolchain to import the
build\NuGetSources.props
file.
That way we don't need to duplicate this list.
In reply to: 220023911 [](ancestors = 220023911)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eerhardt I have changed the way it works: now our project generator generates a simple .csproj file which includes the native dependencies and does not ignore Directory.Build.props
file so BDN is using the nuget feeds defined in Directory.Build.props
…ion which is expensive for long running benchmarks
I disabled the MemoryDiagnoser, now it can be enabled on demand via console args: |
… nuget.config file issue
…, makes all benchmark work again ;)
@Anipik I have registered all the required assemblies so the benchmarks work again @danmosemsft I added one integration test that runs once simple benchmark that has a dependency to native dll and checks if the benchmark executed correctly. I don't want the benchmarks to be broken again ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @adamsitnik. Thanks for enhancing these benchmarks. I just had a few random questions/thoughts.
@@ -19,4 +23,39 @@ internal class EmptyWriter : TextWriter | |||
internal static readonly EmptyWriter Instance = new EmptyWriter(); | |||
public override Encoding Encoding => null; | |||
} | |||
|
|||
internal static class EnvironmentFactory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) can this go in its own file?
{ | ||
var environment = new ConsoleEnvironment(verbose: false, sensitivity: MessageSensitivity.None, outWriter: EmptyWriter.Instance); | ||
|
||
environment.ComponentCatalog.RegisterAssembly(typeof(TLoader).Assembly); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could just have a single extension method on IHostEnvironment that just registered all types/assemblies. Then callers wouldn't need to pass in the TTypes.
For example, like this
public static TEnvironment AddStandardComponents<TEnvironment>(this TEnvironment env) | |
where TEnvironment : IHostEnvironment | |
{ | |
env.ComponentCatalog.RegisterAssembly(typeof(TextLoader).Assembly); // ML.Data | |
env.ComponentCatalog.RegisterAssembly(typeof(LinearPredictor).Assembly); // ML.StandardLearners | |
env.ComponentCatalog.RegisterAssembly(typeof(CategoricalTransform).Assembly); // ML.Transforms | |
env.ComponentCatalog.RegisterAssembly(typeof(FastTreeBinaryPredictor).Assembly); // ML.FastTree | |
env.ComponentCatalog.RegisterAssembly(typeof(EnsemblePredictor).Assembly); // ML.Ensemble | |
env.ComponentCatalog.RegisterAssembly(typeof(KMeansPredictor).Assembly); // ML.KMeansClustering | |
env.ComponentCatalog.RegisterAssembly(typeof(PcaPredictor).Assembly); // ML.PCA | |
env.ComponentCatalog.RegisterAssembly(typeof(Experiment).Assembly); // ML.Legacy | |
return env; | |
} |
and you wouldn't need two Create methods. Just one that creates the environment and registers all the assemblies.
|
||
private ITestOutputHelper Output { get; } | ||
|
||
[Fact(Skip = SkipTheDebug)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I fully understand the goal of this test. It is just testing that copying native dependencies in a BDN project works correctly? It won't catch breaks in our actual benchmark tests, right?
I'm sort of wondering if this test is valuable going forward or not.... What kind of changes is it guarding against?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of this test is to make sure that we can build and run the benchmarks. So far they got broken few times, an example: nuget.config
file was removed, BDN was ignoring the Directory.Builds.props
file which contained the nuget feeds list and it was failing to restore one of the dependencies. Few people pinged me with the exact same problem.
With this test I am confident that there will be no breaking changes for the benchmarks.
I would love to run the benchmarks as part of the test, however, they need a lot of time to execute. Some even 300s for single benchmark invocation
<ItemGroup> | ||
<ProjectReference Include="..\Microsoft.ML.Benchmarks\Microsoft.ML.Benchmarks.csproj" /> | ||
|
||
<NativeAssemblyReference Include="FastTreeNative" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we using FastTree in the test?
{ | ||
Add(DefaultConfig.Instance); // this config contains all of the basic settings (exporters, columns etc) | ||
|
||
Add(GetJobDefinition() // jod defines how many times given benchmark should be executed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jod
type-o
</ItemGroup> | ||
<ItemGroup> | ||
<ProjectReference Include=""{GetProjectFilePath(buildPartition.RepresentativeBenchmarkCase.Descriptor.Type, logger).FullName}"" /> | ||
<NativeAssemblyReference Include=""CpuMathNative"" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to include all of the <NativeAssemblyReference
from the calling BDN project instead of hard-coding them here? That way they only need to be defined in ML.Benchmarks.csproj
.
…don't hardcode the dependencies
{ | ||
public static string DatasetNotFound = "Could not find {0} Please ensure you have run 'build.cmd -- /t:DownloadExternalTestFiles /p:IncludeBenchmarkData=true' from the root"; | ||
} | ||
|
||
// Adding this class to not print anything to the console. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Anipik or @adamsitnik - FYI - whoever is in the benchmark tests again. A new class that just came in is the LocalEnvironment
, which doesn't print to the console by default. We could remove this class and switch our environment from ConsoleEnvironment
to LocalEnvironment
. That will work around the issue as well, and is simpler.
nuget.config
file to work. (I know it's not perfect)RecommendedConfig
as default and when[TrainConfig]
is used it overwrites theRecommendedConfig
. Few types were missing config ([Config(typeof(SomeConfig))]
), and they were using the default config fromBenchmarkDotNet